Softmax policy gradient methods can take exponential time to converge

نویسندگان

چکیده

Abstract The softmax policy gradient (PG) method, which performs ascent under parameterization, is arguably one of the de facto implementations optimization in modern reinforcement learning. For $$\gamma $$ γ -discounted infinite-horizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence PG methods finding a near-optimal policy. However, prior results fall short delineating clear dependencies rates on salient parameters such as cardinality state space $${\mathcal {S}}$$ S and effective horizon $$\frac{1}{1-\gamma }$$ 1 - , both could be excessively large. In this paper, we deliver pessimistic message regarding iteration complexity methods, despite assuming access to exact computation. Specifically, demonstrate that method with stepsize $$\eta η can take $$\begin{aligned} \frac{1}{\eta } |{\mathcal {S}}|^{2^{\Omega \big (\frac{1}{1-\gamma }\big )}} ~\text {iterations} \end{aligned}$$ | 2 Ω ( ) iterations converge, even presence benign initialization an initial distribution amenable exploration (so mismatch coefficient not exceedingly large). This accomplished by characterizing algorithmic dynamics over carefully-constructed MDP containing only three actions. Our exponential lower bound hints at necessity carefully adjusting update rules or enforcing proper regularization accelerating methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not...

متن کامل

Cold-Start Reinforcement Learning with Softmax Policy Gradient

Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction. In this paper, we describe a reinforcement learning method based on a softmax value function that requires neither of these procedures. Our method combines the advantages of policy-gradient methods with the efficiency and simplicity ...

متن کامل

Policy Gradient Methods

A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized control policy by gradient descent. It belongs to the class of policy search techniques that maximize the expected return of a policy in a fixed policy class while traditional value function approximation approaches derive policies from a value function. Policy gradient approaches have various a...

متن کامل

Policy Gradient Methods for Off-policy Control

Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ [13], converge even when using function approximation and incremental updates. However, they have been developed for the case of a fixed behavior policy. In control problems, one would ...

متن کامل

Policy-Gradient Methods for Planning

Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning — in the form of a policy-gradient method — to thes...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematical Programming

سال: 2023

ISSN: ['0025-5610', '1436-4646']

DOI: https://doi.org/10.1007/s10107-022-01920-6